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Abstract 

Let x,y be strings of equal length. The Hamming distance h(x,y) between x and y is the 
number of positions in which x and y differ. If x is a cyclic shift of y, we say x and y are 
conjugates. We consider f{x,y), the Hamming distance between the conjugates xy and yx. 
Over a binary alphabet f{x,y) is always even, and must satisfy a further technical condition. 
By contrast, over an alphabet of size 3 or greater, f(x,y) can take any value between and 
l^l + \y\, except 1; furthermore, we can always assume that the smaller string has only one type 
of letter. 

1 Introduction 

Let X, y be strings of equal length. We dehne the Hamming distance h{x, y) between x and y to be 
the number of positions in which x and y differ [1]. Thus, for example, /i(seven, three) = 4. \i x is 
a cyclic shift of y, we say x and y are conjugates. Alternatively, x is a conjugate of y if there exist 
strings n, v with x = uv and y = vu. For example, x = enlist and y = listen are conjugates; 
take u = en, v = list. 

In this paper we consider the Hamming distance for conjugates. For all strings x, y (not nec- 
essarily of the same length) we define f{x,y) = h{xy,yx). If f{x,y) = 0, then xy = yx, and this 
equation holds if and only if both x and y are powers of some string z [2j. On the other hand, there 
are no solutions to the equation f{x,y) = 1. For as a referee of an earlier version of this paper 
noted, if h{xy, yx) = 1, then we can write xy = uav, yx = ubv for some strings u, v and letters a, b 
with a ^ b. Thus xy contains one more a than yx, which is impossible, since xy and yx clearly 
contain the same number of occurrences of each letter. 

These two examples suggest trying to determine for which k the equation f(x, y) = k is solvable. 
Over an alphabet of size 3 or greater, we show that f{x, y) can take any value between and + 
except 1; furthermore, we can always assume that the smaller string has only one type of letter. 
However, over a binary alphabet, f{x, y) is always even, and must also satisfy a further technical 
condition. 



2 Hamming distance for non-binary alphabets 

Theorem 1. Let S he an alphabet with at least 3 letters, say 0, 1, 2. Suppose m,n,k are integers 
with 1 < m < n, and < k < m + n. 

(a) If m < n, there exist strings x,y ^ T,* with \x\ = m, \y\ = n, such that h{xy,yx) = k if and 
only if k ^ 1. 

(h) If m = n, then there exist strings x,y G S* with \x\ = m, \y\ = n, such that h{xy,yx) = k if 
and only if k is even. 

Furthermore, in both cases, we can always choose x = 0™. 

Proof. We define x = 0™ and y = s{m, n, k), where s{m, n, k) is defined by the fohowing recursion: 

s{m,n,2t) = 0"^*1*, if < t < m < n (1) 
s(m,n,2t+l) = o"-™-4o™-*l*-^2, if I < t < m < n (2) 
s{m,n,k) = Q'^^'^~^s{m,k - m,k), \i2m<k<m + n (3) 
f(l™2™yi^, if n = 2mj + r, < r < m 
[(l™2™yl™2^ if n= (2m + l)j + r, 0<r <m 

First we prove that that these identities suffice to calculate s(m, n, k) for {)<k<m + n,k^\, 
when m < n and for 0<k<m + n, k even, when m = n. 

Suppose m < n. Then there are two cases: either k < 2m + 1 or A: > 2m + 1. Suppose 
k < 2m + 1. If A; is even, in which case /c = 2t, < t < m, we use Eq. ([1]). If /c is odd, in which 
case k = 2t + 1, 1 < t < m, we use Eq. ([2]). 

Now suppose k > 2m + 1. Then if /c < m + n, we use Eq. ([3|), which reduces the case to one 
where k = m + n. In this latter case, we use Eq. 

If m = n, then we use Eq. ([1]) if A: < 2m, and Eqs. ([3]) and if A; > 2m. Thus the identities 
©-(jll) cover all the cases. Furthermore, if |x[ = \y\ then h{xy,yx) = 2h{x,y), so k must be even. 

Now we prove that f{0"^,s{m,n,k)) = k. We start with ([1]). Comparing o'"0"~'^l'^ with 
Qn-fe^fcQm^ we see that since m > k, each 1 is paired with a in the other string, and all other 
symbols are 0, so /(O™, 0"'"^l'') = 2A;. 

Now consider ©. By comparing = QmQn-m-i^Qm-k^k-i2 ^j^h yx = o"""^-! 10"'-'' 1*^-^20'" 
we see that, since k < m, the last k symbols of xy are different from 0, while the last k symbols of 
yx are 0. The block 1^~^2 in yx is matched against o'^-^l in xy. And the first 1 in yx is matched 
against in xy. The total is 2A; + 1 mismatches, as needed. 

Now consider ([3|). Adding O's here to the front of y = s{m, k — m,k) does not change the number 
of mismatches. 

Finally, consider In this case, the O's in xy match against I's in yx. The alternating blocks 
of I's and 2's in xy either match against blocks of the other symbol in yx (I's against 2's and 2's 
against I's), or against the O's at the end of yx. Thus every symbol mismatches, and there are 
m + n of them. 

This completes the proof. □ 
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Examples. We consider some examples. 



s(10,45,55) = Ii02i°li°2i°l5; 

s(10,ll,21) = 1^°2; 

s(10, 10,20) = 1^°; 

s(10,20, 12) = O^^'l^; 

s(10,20,13) = 0^10^1^2. 



3 The case of a binary alphabet 

Suppose x,y are strings over the alphabet {0,1}. We first observe that f{x,y) must always be 
even. For, as the referee of an earlier version observed, if xy and yx differ at an odd number of 
positions, then one of them must contain at least one more than the other. 

However, to completely characterize the solutions to f{x,y) = k, there is one additional condi- 
tion that needs to be imposed: 

Theorem 2. Let m,n be integers with 1 < m < n. Then there exist binary strings x,y with 
f{x,y) = k if and only if 

(a) < k < m + n; 

(b) k is even; 

(c) k < m + n ~ gcd{m, n) if (m + n)/ gcd{m,n) is odd. 

Proof. Suppose /(x, y) = k is solvable. We have already seen that conditions (a) and (b) must hold. 
By comparing xy to yx we see that each symbol is potentially related to (m + n)/ gcd(m, n) — 1 
other symbols. For example, writing xy = z, and using indexing beginning at 0, we see that 2;[0] 
is the first symbol of xy and z[m\ is the first symbol of yx. The m'th symbol of xy is equal to 
the 2m mod (m + n)'th symbol of yx, and so forth. That is, the positions of x and y split into 
gcd(m, n) cycles of length (m + n)/ gcd(m, n); adjacent elements of a cycle line up with each other 
in xy and yx. 

If a cycle is of even length, then over a binary alphabet we can force all the symbols to disagree, 
by choosing them to be and 1 alternately. If a cycle is of odd length, this is impossible. More 
precisely, the number of adjacent pairs that differ in a cycle must be even. 

Therefore, if (m + n)/gcd(m,n) is odd, at most (m + n)/ gcd(m, n) — 1 pairs of any cycle 
can disagree. Since there are gcd(m, n) cycles, the highest Hamming weight we can achieve is 
m + n — gcd(m, n). Thus conditions (a)-(c) must hold. 

Now suppose conditions (a)-(c) hold. We show how to construct x, y such that f{x,y) = k. 
Define g to be (m + n)/ gcd(m, n) if this quantity is odd; otherwise let g = (m + n)/ gcd(m, n) — 1. 
Using the division theorem, divide k by g, obtaining a quotient q and a remainder r. Since k and 
g are even, so is r. In the first q of the cycles, let the symbols alternate between and 1. In the 
{q + l)'th cycle, let the first r symbols alternate and set all the remaining symbols to be the same. 
The resulting string, now split up between x and y, now has qg + r = k positions where xy fails to 
match yx, as desired. □ 
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Example. Suppose (m, n, k) = (6, 9, 10). In this case, the cycles are 

Co = {z[0],z[6],z[12],z[3],z[9]y, 
Cl = {z[l],z[7],z[13],z[A],z[W]); 
C2 = {z[2],z[8],z[U],z[5],z[ll]). 

Now (m + n)/ gcd(m, n) = 5, which is odd. Hence each cycle can give us at most 4 mismatches. 
We generate 4 mismatches with each of the first two cycles by alternating and 1, and generate 2 
mismatches with the last cycle. This gives 

CO = (^[0],z[6],^[12],^[3],z[9]) =(0,1,0,1,0); 
Cl = (z[l],z[7],z[13],z[4],z[10]) = (0,1,0,1,0); 
C2 = {z[2],z[8],z[U],z[5],z[ll]) ={0,1,0,0,0). 

This gives z = 000111000110000 and so x = 000110 and y = 111000000. 

It is interesting to note that, in contrast to the case of large alphabets, in the binary case, even 
if X and y exist with |x| = m, \y\ = n, m < n, and h{x, y) = k, it may not be possible to achieve this 
by choosing x = 0"*. For example, for {m,n,k) = (3,5,8), the lexicographically smallest solution 
is X = 010, y = 10101. 
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